Lognormal Distribution (lognorm)#
The lognormal distribution is the canonical model for positive, right-skewed quantities that arise from multiplicative effects.
If a variable is built as a product of many small random factors, its logarithm often becomes approximately normal (by a CLT-like argument on sums), making the original variable approximately lognormal.
What you’ll learn#
what
lognormmodels and when it’s a good choicethe PDF/CDF in clean LaTeX form
closed-form moments (mean/variance/skewness/kurtosis) and what doesn’t have a closed form (MGF/CF)
how (\mu,\sigma) control location and tail heaviness
core derivations: (\mathbb{E}[X]), (\mathrm{Var}(X)), likelihood + MLE
NumPy-only sampling and Monte Carlo checks
visual intuition via PDF/CDF/histograms
the SciPy API (
scipy.stats.lognorm) and its parameterizationpractical use cases + common pitfalls
Notebook roadmap#
Title & classification
Intuition & motivation
Formal definition (PDF/CDF)
Moments & properties
Parameter interpretation
Derivations ((\mathbb{E}[X]), (\mathrm{Var}(X)), likelihood)
Sampling & simulation (NumPy-only)
Visualization (PDF, CDF, Monte Carlo)
SciPy integration (
scipy.stats.lognorm)Statistical use cases
Pitfalls
Summary
import math
import numpy as np
import scipy
from scipy import stats
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
SEED = 7
rng = np.random.default_rng(SEED)
np.set_printoptions(precision=4, suppress=True)
print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
numpy 1.26.2
scipy 1.15.0
plotly 6.5.2
Prerequisites & notation#
Prerequisites
comfort with basic calculus (change of variables)
basic probability (PDF/CDF, expectation)
Notation (2-parameter lognormal)
We use the standard two-parameter definition:
(X \sim \mathrm{LogNormal}(\mu,\sigma^2)) means (\log X \sim \mathcal{N}(\mu,\sigma^2)).
(\mu\in\mathbb{R}) and (\sigma>0).
A helpful identity to remember:
[ X = \exp(\mu + \sigma Z), \qquad Z\sim\mathcal{N}(0,1). ]
Mapping to SciPy
SciPy parameterizes lognorm as:
scipy.stats.lognorm(s=σ, loc=0, scale=exp(μ))
So:
[ \texttt{s} = \sigma, \qquad \texttt{scale} = e^{\mu}, \qquad \texttt{loc}=0 ;\text{(standard)}. ]
If loc is nonzero, the support becomes (x>\texttt{loc}) and the distribution is a shifted lognormal.
1) Title & classification#
Name:
lognorm(Lognormal distribution)Type: continuous
Support: (x \in (0,\infty)) (standard 2-parameter form)
Parameter space:
(\mu \in \mathbb{R})
(\sigma \in (0,\infty))
A 3-parameter (shifted) lognormal uses an additional location parameter (\mathrm{loc}), giving support (x>\mathrm{loc}).
2) Intuition & motivation#
What it models#
A lognormal random variable is positive and typically right-skewed. It is appropriate when variability is best thought of as multiplicative rather than additive.
If
[ X = X_0 \prod_{j=1}^m U_j, ]
then
[ \log X = \log X_0 + \sum_{j=1}^m \log U_j, ]
and sums of many small, weakly dependent contributions often look approximately normal — making (X) approximately lognormal.
Typical real-world use cases#
Finance: asset prices under geometric Brownian motion (log-returns are modeled as normal)
Reliability / survival: positive durations with multiplicative heterogeneity (lognormal competes with Weibull/Gamma)
Environmental / biomedical: concentrations, exposure levels, positive measurements spanning orders of magnitude
Measurement error: multiplicative noise (e.g., (Y = X \times \varepsilon) with (\varepsilon) lognormal)
Relations to other distributions#
Normal: (\log X) is normal; many inference tasks reduce to normal theory on (\log X).
Products: product of independent lognormals is lognormal (logs add).
Gamma/Weibull: alternative positive skewed families; lognormal often has a heavier right tail than Gamma/Weibull for comparable variance.
3) Formal definition#
Let (X \sim \mathrm{LogNormal}(\mu,\sigma^2)) with (\sigma>0).
Equivalently, let (Y=\log X). Then (Y\sim\mathcal{N}(\mu,\sigma^2)).
PDF#
For (x>0):
[ f(x\mid\mu,\sigma) = \frac{1}{x,\sigma\sqrt{2\pi}}\exp\left(-\frac{(\ln x-\mu)^2}{2\sigma^2}\right). ]
And (f(x\mid\mu,\sigma)=0) for (x\le 0).
CDF#
For (x>0):
[ F(x\mid\mu,\sigma) = \mathbb{P}(X\le x) = \Phi!\left(\frac{\ln x-\mu}{\sigma}\right), ]
where (\Phi) is the standard normal CDF.
4) Moments & properties#
A very useful closed form is the raw (power) moment:
[ \mathbb{E}[X^k] = \exp\left(k\mu + \tfrac{1}{2}k^2\sigma^2\right), \qquad k\in\mathbb{R}. ]
Mean, variance, skewness, kurtosis#
Let (X\sim\mathrm{LogNormal}(\mu,\sigma^2)). Then:
Mean: [ \mathbb{E}[X]=\exp\left(\mu + \tfrac{1}{2}\sigma^2\right) ]
Variance: [ \mathrm{Var}(X)=\bigl(e^{\sigma^2}-1\bigr),\exp\left(2\mu+\sigma^2\right) ]
Skewness: [ \gamma_1 = \bigl(e^{\sigma^2}+2\bigr)\sqrt{e^{\sigma^2}-1} ]
(Excess) kurtosis: [ \gamma_2 = e^{4\sigma^2}+2e^{3\sigma^2}+3e^{2\sigma^2}-6 ]
Other useful summaries:
Median: (\mathrm{med}(X)=e^{\mu})
Mode: (\mathrm{mode}(X)=e^{\mu-\sigma^2})
Quantile: (Q(p)=\exp\bigl(\mu+\sigma,\Phi^{-1}(p)\bigr))
MGF / characteristic function#
The MGF (M_X(t)=\mathbb{E}[e^{tX}]) does not exist (is infinite) for any (t>0).
For (t<0), the Laplace transform (\mathbb{E}[e^{tX}]) exists, but it has no simple elementary closed form.
The characteristic function (\varphi_X(\omega)=\mathbb{E}[e^{i\omega X}]) exists for all real (\omega), but also has no elementary closed form.
In practice, you typically work with moments, quantiles, or compute transforms numerically.
Entropy (differential, in nats)#
Using the change-of-variables relation between (X) and (Y=\log X):
[ h(X) = \mu + \tfrac{1}{2}\log(2\pi e,\sigma^2). ]
LOG_SQRT_2PI = 0.5 * math.log(2 * math.pi)
def lognorm_logpdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray:
"""Lognormal log-PDF for x>0; returns -inf for x<=0.
Parameterization: log X ~ Normal(mu, sigma^2).
"""
x = np.asarray(x, dtype=float)
if sigma <= 0:
return np.full_like(x, -np.inf, dtype=float)
out = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
if np.any(mask):
logx = np.log(x[mask])
z = (logx - mu) / sigma
out[mask] = -logx - math.log(sigma) - LOG_SQRT_2PI - 0.5 * (z * z)
return out
def lognorm_pdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray:
return np.exp(lognorm_logpdf(x, mu=mu, sigma=sigma))
def lognorm_cdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray:
x = np.asarray(x, dtype=float)
if sigma <= 0:
raise ValueError("sigma must be > 0")
out = np.zeros_like(x, dtype=float)
mask = x > 0
if np.any(mask):
z = (np.log(x[mask]) - mu) / sigma
out[mask] = stats.norm.cdf(z)
return out
def lognorm_ppf(p: np.ndarray, mu: float, sigma: float) -> np.ndarray:
p = np.asarray(p, dtype=float)
if np.any((p <= 0) | (p >= 1)):
raise ValueError("p must be in (0,1)")
if sigma <= 0:
raise ValueError("sigma must be > 0")
return np.exp(mu + sigma * stats.norm.ppf(p))
def lognorm_raw_moment(k: float, mu: float, sigma: float) -> float:
# E[X^k] = exp(k*mu + 0.5*k^2*sigma^2)
return math.exp(k * mu + 0.5 * (k * k) * (sigma * sigma))
def lognorm_mean(mu: float, sigma: float) -> float:
return lognorm_raw_moment(1.0, mu=mu, sigma=sigma)
def lognorm_var(mu: float, sigma: float) -> float:
m1 = lognorm_raw_moment(1.0, mu=mu, sigma=sigma)
m2 = lognorm_raw_moment(2.0, mu=mu, sigma=sigma)
return m2 - m1 * m1
def lognorm_skewness(sigma: float) -> float:
# depends only on sigma
a = math.exp(sigma * sigma)
return (a + 2.0) * math.sqrt(a - 1.0)
def lognorm_excess_kurtosis(sigma: float) -> float:
s2 = sigma * sigma
return math.exp(4 * s2) + 2 * math.exp(3 * s2) + 3 * math.exp(2 * s2) - 6
def lognorm_entropy(mu: float, sigma: float) -> float:
return mu + 0.5 * math.log(2 * math.pi * math.e * sigma * sigma)
def sample_lognorm(n: int, mu: float, sigma: float, rng: np.random.Generator) -> np.ndarray:
"""NumPy-only sampling via X = exp(mu + sigma Z), Z~N(0,1)."""
if n <= 0:
raise ValueError("n must be >= 1")
if sigma <= 0:
raise ValueError("sigma must be > 0")
z = rng.normal(loc=0.0, scale=1.0, size=n)
return np.exp(mu + sigma * z)
def lognorm_loglik(mu: float, sigma: float, x: np.ndarray) -> float:
x = np.asarray(x, dtype=float)
if sigma <= 0 or np.any(x <= 0):
return -np.inf
return float(lognorm_logpdf(x, mu=mu, sigma=sigma).sum())
# Quick Monte Carlo sanity check of moments
mu, sigma = 0.3, 0.7
x_mc = sample_lognorm(200_000, mu=mu, sigma=sigma, rng=rng)
mean_emp = x_mc.mean()
var_emp = x_mc.var()
mean_th = lognorm_mean(mu, sigma)
var_th = lognorm_var(mu, sigma)
{
"mean_emp": mean_emp,
"mean_theory": mean_th,
"var_emp": var_emp,
"var_theory": var_th,
"log_mean_emp": float(np.log(x_mc).mean()),
"log_std_emp": float(np.log(x_mc).std(ddof=0)),
}
{'mean_emp': 1.7248113319636609,
'mean_theory': 1.7246083823764353,
'var_emp': 1.8704011965756784,
'var_theory': 1.8806817386743675,
'log_mean_emp': 0.30066798415101526,
'log_std_emp': 0.6993513672997337}
5) Parameter interpretation#
Recall (\log X \sim \mathcal{N}(\mu,\sigma^2)).
Meaning of parameters#
(\mu) is the location in log-space:
(\mathrm{median}(X)=e^{\mu})
multiplying (X) by a constant (c>0) adds (\log c) to (\mu)
(\sigma) is the spread in log-space:
controls right-tail heaviness, skewness, and how far the mean sits above the median
increasing (\sigma) leaves the median fixed but inflates (\mathbb{E}[X]=e^{\mu+\sigma^2/2})
Shape changes (qualitative)#
Larger (\mu): shifts the distribution to the right (multiplicative scaling).
Larger (\sigma): increases dispersion and skewness; the mode moves left ((e^{\mu-\sigma^2})) while the tail gets much heavier.
# How the PDF changes with mu and sigma
mu0 = 0.0
sigmas = [0.25, 0.5, 1.0]
x_min = float(lognorm_ppf(0.001, mu=mu0, sigma=min(sigmas)))
x_max = float(lognorm_ppf(0.995, mu=mu0, sigma=max(sigmas)))
x_grid = np.linspace(0.0, x_max, 700)
fig = go.Figure()
for s in sigmas:
fig.add_trace(go.Scatter(x=x_grid, y=lognorm_pdf(x_grid, mu=mu0, sigma=s), mode="lines", name=f"σ={s:g}"))
fig.add_vline(x=math.exp(mu0), line_dash="dot", opacity=0.25) # median
fig.update_layout(
title="Lognormal PDF: increasing σ increases skew and tail weight (μ fixed)",
xaxis_title="x",
yaxis_title="f(x)",
)
fig.show()
# Changing mu mostly rescales x
mu_values = [-0.8, 0.0, 0.8]
sigma = 0.5
x_max = float(lognorm_ppf(0.995, mu=max(mu_values), sigma=sigma))
x_grid = np.linspace(0.0, x_max, 700)
fig = go.Figure()
for m in mu_values:
fig.add_trace(go.Scatter(x=x_grid, y=lognorm_pdf(x_grid, mu=m, sigma=sigma), mode="lines", name=f"μ={m:g}"))
fig.add_vline(x=math.exp(m), line_dash="dot", opacity=0.25) # median
fig.update_layout(
title="Lognormal PDF: changing μ shifts the scale (σ fixed)",
xaxis_title="x",
yaxis_title="f(x)",
)
fig.show()
6) Derivations#
A) Expectation#
Start from the definition (for (x>0)):
[ \mathbb{E}[X] = \int_0^\infty x,\frac{1}{x,\sigma\sqrt{2\pi}}\exp\left(-\frac{(\ln x-\mu)^2}{2\sigma^2}\right)dx. ]
Cancel the (x) and substitute (y=\ln x) (so (x=e^y), (dx=e^y dy)):
[ \mathbb{E}[X] = \int_{-\infty}^{\infty} \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(y-\mu)^2}{2\sigma^2}\right)e^y,dy = \mathbb{E}[e^Y],\quad Y\sim\mathcal{N}(\mu,\sigma^2). ]
Using the normal MGF (\mathbb{E}[e^{tY}]=\exp(\mu t + \tfrac{1}{2}\sigma^2 t^2)), set (t=1):
[ \mathbb{E}[X] = \exp\left(\mu + \tfrac{1}{2}\sigma^2\right). ]
B) Variance#
Similarly, compute (\mathbb{E}[X^2]=\exp(2\mu+2\sigma^2)) and subtract the square of the mean:
[ \mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2 =\exp(2\mu+2\sigma^2)-\exp(2\mu+\sigma^2) =\bigl(e^{\sigma^2}-1\bigr)\exp(2\mu+\sigma^2). ]
C) Likelihood and MLE#
For iid data (x_1,\dots,x_n) with (x_i>0), the log-likelihood is
[ \ell(\mu,\sigma) = -n\log\sigma - n\tfrac{1}{2}\log(2\pi) - \sum_{i=1}^n \log x_i - \frac{1}{2\sigma^2}\sum_{i=1}^n (\log x_i-\mu)^2. ]
Let (y_i=\log x_i). Then this is exactly the normal log-likelihood for (y_i\sim\mathcal{N}(\mu,\sigma^2)) plus the Jacobian term (-\sum\log x_i) which does not depend on (\mu) or (\sigma).
Therefore the MLEs are the same as for the normal:
[ \hat\mu = \bar{y},\qquad \hat\sigma^2 = \frac{1}{n}\sum_{i=1}^n (y_i-\bar{y})^2. ]
(Notice the (1/n), not (1/(n-1)): MLE vs unbiased variance estimator.)
# MLE demo: estimate (mu, sigma) from simulated data
mu_true, sigma_true = 0.4, 0.8
n = 800
x = sample_lognorm(n, mu=mu_true, sigma=sigma_true, rng=rng)
y = np.log(x)
mu_hat = float(y.mean())
sigma_hat = float(y.std(ddof=0))
loglik_true = lognorm_loglik(mu_true, sigma_true, x)
loglik_mle = lognorm_loglik(mu_hat, sigma_hat, x)
{
"mu_true": mu_true,
"mu_hat": mu_hat,
"sigma_true": sigma_true,
"sigma_hat": sigma_hat,
"loglik_true": loglik_true,
"loglik_mle": loglik_mle,
}
{'mu_true': 0.4,
'mu_hat': 0.4328500520233074,
'sigma_true': 0.8,
'sigma_hat': 0.8032840930173821,
'loglik_true': -1306.8813146389489,
'loglik_mle': -1306.1933977477677}
7) Sampling & simulation (NumPy-only)#
Because (\log X) is normal, sampling is extremely simple:
sample (Z\sim\mathcal{N}(0,1))
return (X = \exp(\mu + \sigma Z))
This is exact (not an approximation), and uses only a normal RNG plus exponentiation.
Why it works: If (Z\sim\mathcal{N}(0,1)) then (\mu+\sigma Z\sim\mathcal{N}(\mu,\sigma^2)), and exponentiating turns a normal into a lognormal.
# Simulation check: log(samples) should look Normal(mu, sigma^2)
mu, sigma = 0.3, 0.7
n = 50_000
samples = sample_lognorm(n, mu=mu, sigma=sigma, rng=rng)
log_samples = np.log(samples)
z = (log_samples - mu) / sigma
# Histogram of log-samples + normal PDF overlay
grid = np.linspace(mu - 4 * sigma, mu + 4 * sigma, 500)
fig = px.histogram(
log_samples,
nbins=70,
histnorm="probability density",
title="log(X) is normal: histogram of log-samples",
labels={"value": "y = log(x)"},
)
fig.add_trace(go.Scatter(x=grid, y=stats.norm.pdf(grid, loc=mu, scale=sigma), mode="lines", name="Normal PDF"))
fig.update_layout(yaxis_title="density")
fig.show()
{
"log_mean_emp": float(log_samples.mean()),
"log_std_emp": float(log_samples.std(ddof=0)),
"z_mean_emp": float(z.mean()),
"z_std_emp": float(z.std(ddof=0)),
}
{'log_mean_emp': 0.3002472380387193,
'log_std_emp': 0.6995177336636519,
'z_mean_emp': 0.0003531971981704151,
'z_std_emp': 0.9993110480909314}
8) Visualization (PDF, CDF, Monte Carlo)#
Because lognormal tails can span orders of magnitude, it’s often useful to look at:
PDF on a linear x-axis (to see the mode)
CDF (to see where most mass lies)
Monte Carlo samples (histogram) compared to the theoretical PDF
# Monte Carlo samples vs theoretical PDF
mu, sigma = 0.2, 0.9
n = 80_000
samples = sample_lognorm(n, mu=mu, sigma=sigma, rng=rng)
# Plot most of the mass (truncate extreme tail for visualization)
x_max = float(np.quantile(samples, 0.995))
x_grid = np.linspace(0.0, x_max, 600)
fig = px.histogram(
samples[samples <= x_max],
nbins=80,
histnorm="probability density",
title=f"Lognormal: samples vs PDF (n={n}, μ={mu:g}, σ={sigma:g})",
labels={"value": "x (tail truncated at 99.5% quantile)"},
)
fig.add_trace(go.Scatter(x=x_grid, y=lognorm_pdf(x_grid, mu=mu, sigma=sigma), mode="lines", name="true PDF"))
fig.update_layout(yaxis_title="density")
fig.show()
# PDF and CDF for multiple sigmas
mu = 0.0
sigmas = [0.25, 0.5, 1.0]
x_max = float(lognorm_ppf(0.995, mu=mu, sigma=max(sigmas)))
x_grid = np.linspace(0.0, x_max, 700)
fig_pdf = go.Figure()
fig_cdf = go.Figure()
for s in sigmas:
fig_pdf.add_trace(go.Scatter(x=x_grid, y=lognorm_pdf(x_grid, mu=mu, sigma=s), mode="lines", name=f"σ={s:g}"))
fig_cdf.add_trace(go.Scatter(x=x_grid, y=lognorm_cdf(x_grid, mu=mu, sigma=s), mode="lines", name=f"σ={s:g}"))
fig_pdf.update_layout(title="Lognormal PDF (μ fixed)", xaxis_title="x", yaxis_title="f(x)")
fig_cdf.update_layout(title="Lognormal CDF (μ fixed)", xaxis_title="x", yaxis_title="F(x)")
fig_pdf.show()
fig_cdf.show()
# Empirical CDF vs true CDF
mu, sigma = 0.2, 0.9
n = 30_000
samples = sample_lognorm(n, mu=mu, sigma=sigma, rng=rng)
xs = np.sort(samples)
ys = np.arange(1, n + 1) / n
x_grid = np.linspace(0.0, float(np.quantile(xs, 0.995)), 700)
fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name="empirical CDF"))
fig.add_trace(go.Scatter(x=x_grid, y=lognorm_cdf(x_grid, mu=mu, sigma=sigma), mode="lines", name="true CDF"))
fig.update_layout(
title=f"Empirical CDF vs true CDF (n={n}, μ={mu:g}, σ={sigma:g})",
xaxis_title="x",
yaxis_title="F(x)",
)
fig.show()
9) SciPy integration (scipy.stats.lognorm)#
SciPy’s lognorm distribution uses:
s= (\sigma) (shape)scale= (e^{\mu})loc= location shift (0 for the standard lognormal)
So if (X\sim\mathrm{LogNormal}(\mu,\sigma^2)), then:
rv = stats.lognorm(s=sigma, loc=0.0, scale=math.exp(mu))
Methods you’ll commonly use:
rv.pdf(x),rv.logpdf(x)rv.cdf(x)rv.rvs(size=..., random_state=...)stats.lognorm.fit(data, floc=0.0)
from scipy.stats import lognorm
mu, sigma = 0.2, 0.9
rv = lognorm(s=sigma, loc=0.0, scale=math.exp(mu))
x_grid = np.linspace(0.0, float(lognorm_ppf(0.995, mu=mu, sigma=sigma)), 600)
# Compare our PDF/CDF with SciPy
pdf_max_abs_diff = float(np.max(np.abs(rv.pdf(x_grid) - lognorm_pdf(x_grid, mu=mu, sigma=sigma))))
cdf_max_abs_diff = float(np.max(np.abs(rv.cdf(x_grid) - lognorm_cdf(x_grid, mu=mu, sigma=sigma))))
# Sampling
samples_scipy = rv.rvs(size=20_000, random_state=rng)
# Fit (standard lognormal: fix loc=0)
shape_hat, loc_hat, scale_hat = lognorm.fit(samples_scipy, floc=0.0)
mu_hat = math.log(scale_hat)
sigma_hat = shape_hat
{
"pdf_max_abs_diff": pdf_max_abs_diff,
"cdf_max_abs_diff": cdf_max_abs_diff,
"fit_loc_hat": float(loc_hat),
"fit_mu_hat": mu_hat,
"fit_sigma_hat": sigma_hat,
"true_mu": mu,
"true_sigma": sigma,
}
{'pdf_max_abs_diff': 3.3306690738754696e-16,
'cdf_max_abs_diff': 2.220446049250313e-16,
'fit_loc_hat': 0.0,
'fit_mu_hat': 0.20712347539101095,
'fit_sigma_hat': 0.8994332226964946,
'true_mu': 0.2,
'true_sigma': 0.9}
10) Statistical use cases#
A) Hypothesis testing#
A common workflow is:
transform data with (y_i=\log x_i)
check whether (y_i) is plausibly normal (QQ plot / normality tests)
if reasonable, use normal-theory inference for (\mu,\sigma) on (y)
Example tests:
Normality on log-data: Shapiro-Wilk, Anderson-Darling, etc.
Testing the median: since (\mathrm{median}(X)=e^{\mu}), testing a median corresponds to testing (\mu) on (\log X).
B) Bayesian modeling#
A lognormal likelihood is often convenient in hierarchical models for positive outcomes.
In log-space, (y_i=\log x_i) is normal, so you can use conjugate priors like the Normal-Inverse-Gamma for ((\mu,\sigma^2)).
C) Generative modeling#
Lognormal noise is a natural choice for multiplicative perturbations:
[ X_{\text{observed}} = X_{\text{true}} \times \varepsilon, \qquad \varepsilon\sim\mathrm{LogNormal}(0,\tau^2) ]
It also composes nicely: products of independent lognormals are lognormal.
# A) Hypothesis testing ideas via log-transform
mu, sigma = 0.0, 0.7
n = 500
x = sample_lognorm(n, mu=mu, sigma=sigma, rng=rng)
y = np.log(x)
# 1) Normality test on y = log(x)
shapiro_stat, shapiro_p = stats.shapiro(y)
# 2) Testing the median m0: median(X)=exp(mu) -> H0: mu = log(m0)
m0 = math.exp(mu) # true median in this simulation
mu0 = math.log(m0)
t_stat, t_p = stats.ttest_1samp(y, popmean=mu0)
# 3) QQ plot for y against Normal
(osm, osr), (slope, intercept, r) = stats.probplot(y, dist="norm")
fig = go.Figure()
fig.add_trace(go.Scatter(x=osm, y=osr, mode="markers", name="log-data quantiles"))
line_x = np.array([osm.min(), osm.max()])
fig.add_trace(go.Scatter(x=line_x, y=intercept + slope * line_x, mode="lines", name="fit line"))
fig.update_layout(
title="QQ plot: log(x) vs Normal",
xaxis_title="theoretical quantiles",
yaxis_title="sample quantiles of log(x)",
)
fig.show()
{
"shapiro_stat": float(shapiro_stat),
"shapiro_p": float(shapiro_p),
"t_test_stat_for_mu": float(t_stat),
"t_test_p": float(t_p),
"qq_r": float(r),
}
{'shapiro_stat': 0.993361660448632,
'shapiro_p': 0.026728693326586762,
't_test_stat_for_mu': 1.2115097780447683,
't_test_p': 0.22627348624645166,
'qq_r': 0.9972006534532416}
# B) Bayesian modeling in log-space: Normal-Inverse-Gamma prior
# Model: y_i = log x_i ~ Normal(mu, sigma^2)
# Prior hyperparameters
mu0 = 0.0
kappa0 = 1.0
alpha0 = 2.0
beta0 = 1.0
# Simulated data
mu_true, sigma_true = 0.3, 0.6
n = 200
x = sample_lognorm(n, mu=mu_true, sigma=sigma_true, rng=rng)
y = np.log(x)
y_bar = float(y.mean())
ssq = float(((y - y_bar) ** 2).sum())
# Posterior update (Normal-Inverse-Gamma)
kappa_n = kappa0 + n
mu_n = (kappa0 * mu0 + n * y_bar) / kappa_n
alpha_n = alpha0 + 0.5 * n
beta_n = beta0 + 0.5 * ssq + (kappa0 * n * (y_bar - mu0) ** 2) / (2 * kappa_n)
# Sample from posterior
M = 20_000
sigma2_samps = stats.invgamma(a=alpha_n, scale=beta_n).rvs(size=M, random_state=rng)
mu_samps = rng.normal(loc=mu_n, scale=np.sqrt(sigma2_samps / kappa_n))
# Posterior for the median m = exp(mu)
median_samps = np.exp(mu_samps)
ci_95 = np.quantile(median_samps, [0.025, 0.975])
fig = px.histogram(
median_samps,
nbins=80,
histnorm="probability density",
title="Posterior for median exp(mu) under N-Inv-Gamma prior (log-space)",
labels={"value": "median = exp(mu)"},
)
fig.add_vline(x=math.exp(mu_true), line_dash="dot", opacity=0.5)
fig.add_vline(x=float(ci_95[0]), line_dash="dash", opacity=0.35)
fig.add_vline(x=float(ci_95[1]), line_dash="dash", opacity=0.35)
fig.show()
{
"posterior_mu_mean": float(mu_samps.mean()),
"posterior_sigma_mean": float(np.sqrt(sigma2_samps).mean()),
"median_true": math.exp(mu_true),
"median_95_CI": [float(ci_95[0]), float(ci_95[1])],
}
{'posterior_mu_mean': 0.33776111088030053,
'posterior_sigma_mean': 0.6733830448193903,
'median_true': 1.3498588075760032,
'median_95_CI': [1.2778456708726476, 1.538595654436261]}
# C) Generative modeling: products of lognormals are lognormal
# If X1 ~ LN(mu1, s1^2) and X2 ~ LN(mu2, s2^2) independently,
# then X1*X2 ~ LN(mu1+mu2, s1^2 + s2^2).
mu1, s1 = 0.2, 0.4
mu2, s2 = -0.1, 0.7
n = 120_000
x1 = sample_lognorm(n, mu=mu1, sigma=s1, rng=rng)
x2 = sample_lognorm(n, mu=mu2, sigma=s2, rng=rng)
prod = x1 * x2
mu_pred = mu1 + mu2
sigma_pred = math.sqrt(s1 * s1 + s2 * s2)
# Compare moments in log-space
log_prod = np.log(prod)
# Visual check: histogram vs predicted PDF (truncate tail)
x_max = float(np.quantile(prod, 0.995))
x_grid = np.linspace(0.0, x_max, 700)
fig = px.histogram(
prod[prod <= x_max],
nbins=90,
histnorm="probability density",
title="Product of independent lognormals is lognormal (empirical vs theory)",
labels={"value": "x (tail truncated at 99.5% quantile)"},
)
fig.add_trace(go.Scatter(x=x_grid, y=lognorm_pdf(x_grid, mu=mu_pred, sigma=sigma_pred), mode="lines", name="predicted PDF"))
fig.update_layout(yaxis_title="density")
fig.show()
{
"mu_pred": mu_pred,
"sigma_pred": sigma_pred,
"log_mean_emp": float(log_prod.mean()),
"log_std_emp": float(log_prod.std(ddof=0)),
}
{'mu_pred': 0.1,
'sigma_pred': 0.8062257748298549,
'log_mean_emp': 0.1007461339910088,
'log_std_emp': 0.805805795463638}
11) Pitfalls#
Nonpositive data: the standard lognormal requires (x>0). Zeros/negatives break the (\log) transform and make likelihood (-\infty).
Parameterization confusion:
many texts use ((\mu,\sigma)) for the normal parameters of (\log X)
SciPy uses
lognorm(s=σ, scale=exp(μ), loc=...)
Tail truncation in plots: lognormal tails can be huge; for readability it’s common to truncate at a high quantile (e.g., 99.5%).
Overflow/underflow:
sampling uses
exp(mu + sigma*z)which can overflow if (\mu+\sigma z) is too largeevaluating the PDF can underflow for extreme (x) or large (\sigma); prefer
logpdffor likelihood work
Fitting with
locfree: allowinglocto vary can yield a shifted fit; if you expect the standard lognormal, usefloc=0.0.MGF-based methods: the MGF diverges for (t>0), so techniques relying on an MGF neighborhood around 0 can fail.
12) Summary#
lognormis continuous on ((0,\infty)) and is defined by (\log X\sim\mathcal{N}(\mu,\sigma^2)).PDF: (f(x)=\frac{1}{x\sigma\sqrt{2\pi}}\exp\bigl(-\frac{(\ln x-\mu)^2}{2\sigma^2}\bigr)), CDF: (F(x)=\Phi\bigl((\ln x-\mu)/\sigma\bigr)).
Raw moments: (\mathbb{E}[X^k]=\exp(k\mu+\tfrac{1}{2}k^2\sigma^2)); in particular (\mathbb{E}[X]=e^{\mu+\sigma^2/2}) and (\mathrm{Var}(X)=(e^{\sigma^2}-1)e^{2\mu+\sigma^2}).
Median (=e^{\mu}), mode (=e^{\mu-\sigma^2}); increasing (\sigma) dramatically increases skewness and tail heaviness.
MLEs are normal-theory MLEs on (y=\log x): (\hat\mu=\bar y), (\hat\sigma^2=\frac{1}{n}\sum(y_i-\bar y)^2).
SciPy mapping:
stats.lognorm(s=sigma, loc=0, scale=exp(mu))andlognorm.fit(data, floc=0)for the standard case.